1 Overview

For this session, you will need the following packages:

Install them by calling

install.packages(c(
  "gapminder",
  "gganimate",
  "plotly",
  "transformr"
))

2 Renderer

For gganimate to be able to produce animated images, you need to install a renderer. gifski is usually the best choice, but can be somewhat tricky to install depending on what operating system you are on. If you aren’t able to install gifski for some reason, you can also use ImageMagick.

First, simply try to run

install.packages("gifski")

or

install.packages("magick")

If any of the above works without a hitch, then you’re done! If not, try the advanced instructions below.

2.1 Advanced Instructions for Windows

Try to install imagemagick by downloading and installing the recommended installation file at https://imagemagick.org/script/download.php#windows.

After this try to run install.packages("magick") again.

2.2 Advanced Instructions for macOS

To install gifski on mac, first make sure you make homebrew installed. If you do not, open the macOS terminal and run

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

to install it. Afterwards, you call

brew install rustc

and then install the gifski R package by running

install.packages("gifski")

as before.

You can also try to install the Mac App Store application here, if the solution above does not work.

To install ImageMagick on macOS, you can do so using homebrew. First make sure you’ve installed homebrew (as outlined above) and then run

brew install imagemagick

2.3 Linux

Install gifski is easy using snap. Simply go to https://snapcraft.io/gifski for instructions for any given distribution.

3 Gapminder

Do you remember the animated plots we produced in the introductory presentation workshop based on the Gapminder Hans Rosling animated visualization?

In this worked example, we’ll work out how to reproduce that plot as both an animated an interactive visualization.

4 Animated Visualization

The dataset that we’ll use is available via the gapminder package. Loadign gapminder mThisakes the dataset directly available in an object called gapminder. These are the first few rows of the dataset.

library(gapminder)

head(gapminder)
## # A tibble: 6 × 6
##   country     continent  year lifeExp      pop gdpPercap
##   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
## 1 Afghanistan Asia       1952    28.8  8425333      779.
## 2 Afghanistan Asia       1957    30.3  9240934      821.
## 3 Afghanistan Asia       1962    32.0 10267083      853.
## 4 Afghanistan Asia       1967    34.0 11537966      836.
## 5 Afghanistan Asia       1972    36.1 13079460      740.
## 6 Afghanistan Asia       1977    38.4 14880372      786.

The variables should be self-explanatory.

Let’s jump right in! Create a bubble plot faceted on year (which we cut into groups), with population mapped to the size of the bubbles, and GDP per capita and life expectancy on the x and y axes respectively.

Here’s some code to get started:

library(tidyverse)

gapminder |>
  mutate(years = cut_interval(year, length = 5)) |>
  ggplot(...)

Just as I said in the first presentation, this visualization is not (yet) working out so well for us. Let’s make it animated instead. For this, we’ll use the gganimate package.

Build the plot as before, but now make it animated by adding the transition_time() function to the plot, mapping the animation to year. Also use title = "Year: {frame_time}" in your labs() call to animate the label, showing which year it is.

If you think the plot is still crowded, we could alternatively use facets to separate continents. If you want to, you can make use of the country_colors object that is included in the gapminder package by adding the following line to your plot.

scale_colour_manual(values = country_colors, guide = FALSE)

Try to add ease_aes("cubic-in-out") to the plot to change the transition function and see what the difference is. There are other options available if you check out the documentation for ease_aes() too.

So far our plot does a good job of showing the trends among the various continents of the world but is hard to use if we are interested in one specific country. A remedy for this can be to use labels to let us identify which bubble belongs to which country. The large number of countries, however, means that it’s not a frightfully good idea to label all of them.

Instead, we’ll pick out the largest two countries (at the latest time stamp) on each continent and label those. First, we store the names of the countries in a vector, large_country_names.

The following steps first filter the dataset so that only observations from the latest year (max(year)) are kept, then groups the dataset by continent, then slices the dataset so that the observations (countries) with the largest and next-to-largest values of population (pop) of each group (continent) are kept, and then finally pulls out (using pull()) the country names.

large_country_names <-
  gapminder |>
  filter(year == max(year)) |>
  group_by(continent) |>
  slice_max(pop, n = 2) |>
  pull(country)

large_country_names
##  [1] Nigeria       Egypt         United States Brazil        China        
##  [6] India         Germany       Turkey        Australia     New Zealand  
## 142 Levels: Afghanistan Albania Algeria Angola Argentina Australia ... Zimbabwe

Then we filter the original dataset to create a separate dataset for our labels.

large_countries <- filter(gapminder, country %in% large_country_names)

Now it’s your turn to try to put everything together. Label the countries with geom_label_repel() from the ggrepel package, in order to avoid overlapping labels. Note that working with labels and animated visualizations is something of a challenge. I had to tweak the settings (mostly nudge_x and nudge_y) several times in order to get something that looks good.

The final result should look something like the following figure.

Life expectancy and GDP per capita with countries. The two largest countries at the start (in terms of population) have been labeled.

Figure 1: Life expectancy and GDP per capita with countries. The two largest countries at the start (in terms of population) have been labeled.

5 Interactive Visualization

Interactive visualizations are often effective, particularly when we want to visualize a complicated dataset such as this one. Here we’ll use the plotly package to do so, which, as you may recall from the lecture, works well in tandem with ggplot. First install the package.

install.packages("plotly")

Then load the package.

library(plotly)

Now we redraw the plot, adding an interactive slider to select the year using plotly. Make note of the additional mapping that we’ve added to geom_point(), namely frame, which is a special mapping that will let plotly know which variable to use to separate the visualization into frames.

p <- ggplot(gapminder, aes(gdpPercap, lifeExp)) +
  geom_point(aes(frame = year), alpha = 0.5) +
  scale_colour_manual(values = country_colors, guide = FALSE) +
  scale_size(range = c(2, 12)) +
  scale_x_log10() +
  facet_wrap(~continent) +
  labs(x = "GDP per capita", y = "Life expectancy")

ggplotly(p)

Figure 2: An interactive visualization using plotly for the Gapminder data.

Notice how seamless the conversion of ggplots into interactive plots can be with the help of plotly.

Try to modify the plot by adding additional dummy mapings in the aes() call to the main ggplot function to be able to obtain information on these variables in the tooltips too.